Using Locality-Sensitive Hashing for SVM Classification of Large Data Sets

نویسندگان

چکیده

We propose a novel method using Locality-Sensitive Hashing (LSH) for solving the optimization problem that arises in training stage of support vector machines large data sets, possibly high dimensions. LSH was introduced as an efficient way to look neighbors dimensional spaces. Random projections-based functions create bins so when great probability points belonging same bin are close, far will not be bin. Based on these bins, it is necessary consider whole original set but representatives each one them, thus reducing effective size set. A key our proposal we work with feature space and use only projections search closeness this space. Moreover, instead choosing projection directions at random, sample small subset solve associated SVM problem. Projections direction allows more precise many cases approximation solution found fraction running time degradation classification error. present two algorithms, theoretical support, numerical experiments showing their performances real life problems taken from LIBSVM base.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Locality-Sensitive Hashing for Protein Classification

Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successi...

متن کامل

Large-Scale Distributed Locality-Sensitive Hashing for General Metric Data

Locality-Sensitive Hashing (LSH) is extremely competitive for similarity search, but works under the assumption of uniform access cost to the data, and for just a handful of dissimilarities for which locality-sensitive families are available. In this work we propose Parallel Voronoi LSH, an approach that addresses those two limitations of LSH: it makes LSH efficient for distributedmemory archit...

متن کامل

Robust and Efficient Locality Sensitive Hashing for Nearest Neighbor Search in Large Data Sets

Locality sensitive hashing (LSH) has been used extensively as a basis for many data retrieval applications. However, previous approaches, such as random projection and multi-probe hashing, may exhibit high query complexity of up to Θ(n) when the underlying data distribution is highly skewed. This is due to the imbalance in the number of data stored per each bucket, which leads to slow query tim...

متن کامل

Instance-Based Matching of Large Ontologies Using Locality-Sensitive Hashing

In this paper, we describe a mechanism for ontology alignment using instance based matching of types (or classes). Instance-based matching is known to be a useful technique for matching ontologies that have different names and different structures. A key problem in instance matching of types, however, is scaling the matching algorithm to (a) handle types with a large number of instances, and (b...

متن کامل

Hierarchical clustering of large text datasets using Locality-Sensitive Hashing

In this paper, we present a hierarchical clustering algorithm of the large text datasets using Locality-Sensitive Hashing (LSH). The main idea of the LSH is to “hash” items several times, in such a way that similar items are more likely to be hashed to the same bucket than dissimilar are. The main drawback of the conventional hierarchical algorithms is a large time complexity (e.g. Single Linka...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics

سال: 2022

ISSN: ['2227-7390']

DOI: https://doi.org/10.3390/math10111812